{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Copyright 2019 Google LLC\n",
    "# \n",
    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "#\n",
    "#     https://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "# See the License for the specific language governing permissions and\n",
    "# limitations under the License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a target=\"_blank\" href=\"https://colab.research.google.com/github/GoogleCloudPlatform/keras-idiomatic-programmer/blob/master/notebooks/estimating_your_training_utilization.ipynb\">\n",
    "<img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Managing your CPU/GPU utilization during training with Warm-Up\n",
    "\n",
    "## Objective\n",
    "\n",
    "This notebook demonstrates a simple manner for you to estimate and control your CPU/GPU utilization during training. Currently training infrastructure does not do auto-scaling (unlike batch prediction). Instead, you sent your utilization strategy as part of starting your training job.\n",
    "\n",
    "If your training on the cloud, a poor utilization may result in an under or over utilization. In under utilization, you're leaving compute power (money) on the table. In over utilization, the training job may become bottleneck or excessively interrupted by other processes.\n",
    "\n",
    "Things you might consider when under utilizing. Do I scale up (larger instances) or do I scale out (distributed training). \n",
    "\n",
    "In this notebook, we will use short training runs (warm-start) combined with the psutil module to see what our utilization will be when we do a full training run. Since we are only interested in utilization, we don't care what the accuracy is --we can just use a defacto (best guess) on hyperparameters.\n",
    "\n",
    "In my experience, I find the sweetspot for utilization on a single instance is 70%. That leaves enough compute power from background processes pre-empting the training and if training headless, to be able to ssh in and monitor the system."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Imports\n",
    "\n",
    "We will be using tensorflow and the psutil module. This notebook will work with both TF 1.X and TF 2.0."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow\n",
    "import psutil"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get Dataset\n",
    "\n",
    "Let's use the MNIST dataset (for brevity) as if this is the dataset you will use it for training. We will draw from the dataset during the warm-start training in the same manner that we plan to do in the later full training. In this case, because the total data is small enough to fit it into memory, we load the whole dataset into memory as a multi-dimensional numpy array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the builtin MNIST dataset\n",
    "from tensorflow.keras.datasets import mnist\n",
    "import numpy as np\n",
    "(x_train, y_train), (x_test, y_test) = mnist.load_data()\n",
    "\n",
    "# Normalize the data\n",
    "x_train = (x_train / 255.0).astype(np.float32)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Make Model\n",
    "\n",
    "We will use the `create_model()` function to create simple DNN models. DNN models are sufficient for training a MNIST model. This model will make N layers (`n_layers`) of the same number of nodes (`n_nodes`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tensorflow.keras.layers import Flatten, Dense\n",
    "from tensorflow.keras import Sequential\n",
    "\n",
    "def create_model(n_layers, n_nodes):\n",
    "    model = Sequential()\n",
    "    model.add(Flatten(input_shape=(28, 28)))\n",
    "    for _ in range(n_layers):\n",
    "        model.add(Dense(n_nodes, activation='relu'))\n",
    "    model.add(Dense(10, activation='softmax'))\n",
    "    model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['acc'])\n",
    "    return model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Do a warm start to view the CPU/GPU utilization\n",
    "\n",
    "### Small Model, 1 layer 128 nodes\n",
    "\n",
    "Okay, let's start. In our first test, we try a model with one hidden dense layer of 128 nodes.\n",
    "\n",
    "We then do a `psutil.cpu_percent(interval=None)`. By setting that parameter `interval=None`, we set a checkpoint (start point) for measuring our CPU/GPU utilization.\n",
    "\n",
    "We then train the model for a couple (`epochs=2`) of epochs. If your datasets are huge and drawn from storage, you might want to use just a sub-distribution from your data (i.e., a smaller amount of the dataset) by setting the `steps_per_epoch` parameter.\n",
    "\n",
    "Once the training finishes, we then issue a `psutil.cpu_percent(interval=None, percpu=True)`. This will report the CPU/GPU utilization on all CPUs/GPUs on the instance since the start interval checkpoint. We then do a `psutil.cpu_percent(interval=None)` to show the average utilization across all the CPUs/GPUs.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = create_model(1, 128)\n",
    "model.summary()\n",
    "\n",
    "import psutil\n",
    "set_interval = psutil.cpu_percent(interval=None)\n",
    "model.fit(x_train, y_train, epochs=2, verbose=1)\n",
    "print(psutil.cpu_percent(interval=None, percpu=True), psutil.cpu_percent(interval=None))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Larger Model, 2 layers 1024 nodes\n",
    "\n",
    "On our next example, we will make the model 16X more computationally expensive by having two hidden layers of 1024 nodes each."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = create_model(2, 1024)\n",
    "model.summary()\n",
    "\n",
    "set_interval = psutil.cpu_percent(interval=None)\n",
    "model.fit(x_train, y_train, epochs=2, verbose=1, workers=2)\n",
    "print(psutil.cpu_percent(interval=None, percpu=True), psutil.cpu_percent(interval=None))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Even Larger Model, 4 layers 2048 nodes\n",
    "\n",
    "In our last example, we will make the model 128X more computationally expensive by having two hidden layers of 1024 nodes each."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = create_model(4, 2048)\n",
    "model.summary()\n",
    "\n",
    "set_interval = psutil.cpu_percent(interval=None)\n",
    "model.fit(x_train, y_train, epochs=2, verbose=1)\n",
    "print(psutil.cpu_percent(interval=None, percpu=True), psutil.cpu_percent(interval=None))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
